Remove clustalw, set fixed raxml memory #72

natefoo · 2024-10-15T19:35:39Z

I am not sure what is going on here but:

AU allocates from 11.5 to 30.7 GB and multiple cores, although I don't believe the wrapper supports multiple cores?
In the shared DB (and on .org) we allocate 34 GB.

In practice on .org the histogram looks like:

ndc@galaxy-db% gxadmin tsvquery tool-metrics 'toolshed.g2.bx.psu.edu/repos/devteam/clustalw/clustalw/2.1+galaxy1' memory.peak --ok | awk '{print $1 / 1024 / 1024 / 1024}' | gxadmin filter histogram
(   0.221,    0.466) n=6472  **************************************************
[   0.466,    0.711) n=131   *
[   0.711,    0.956) n=86
[   0.956,    1.201) n=47
[   1.201,    1.446) n=24
[   1.446,    1.691) n=30
[   1.691,    1.936) n=22
[   1.936,    2.181) n=23
[   2.181,    2.426) n=8
[   2.426,    2.671) n=8
[   2.671,    2.916) n=4
[   2.916,    3.161) n=4
[   3.161,    3.406) n=1
[   3.406,    3.651) n=6
[   3.651,    3.896) n=2
[   3.896,    4.141) n=4
[   4.141,    4.386) n=1
[   4.386,    4.631) n=0
[   4.631,    4.876) n=0
[   4.876,    5.121) n=2
[   5.121,    5.367) n=0
[   5.367,    5.612) n=1
[   5.612,    5.857) n=1
[   5.857,    6.102) n=1
[   6.102,    6.347) n=2
[   6.347,    6.592) n=1
[   6.592,    6.837) n=0
[   6.837,    7.082) n=0
[   7.082,    7.327) n=1
[   7.327,    7.572) n=0
[   7.572,    7.817) n=1
[   7.817,    8.062) n=0
[   8.062,    8.307) n=0
[   8.307,    8.552) n=0
[   8.552,    8.797) n=0
[   8.797,    9.042) n=0
[   9.042,    9.287) n=0
[   9.287,    9.532) n=0
[   9.532,    9.777) n=0
[   9.777,   10.022) n=1

Maybe old versions were much more inefficient? But it does not seem like we need to allocate anything special for the current version. I would be interested to see what memory usage looks like at EU and AU.

Also with larger inputs the tool runs forever, AU rejects anything >= 40 MB, I will update if I can determine what the rough size at which it runs forever for us is, and maybe it is worth including that in the shared DB?

raxml wants 16 cores but because of the memory factor that most people use in their default tool (see #73) is probably going to also request ~64 GB of memory despite only using ~2 GB:

ndc@galaxy-db% gxadmin tsvquery tool-metrics 'toolshed.g2.bx.psu.edu/repos/iuc/raxml/raxml/8.2%' memory.peak --ok --like | awk '{print $1 / 1024 / 1024 / 1024}' | gxadmin filter histogram
(   0.226,    2.336) n=554   **************************************************
[   2.336,    4.446) n=8
[   4.446,    6.556) n=5
[   6.556,    8.666) n=5
[   8.666,   10.776) n=0
[  10.776,   12.887) n=1
[  12.887,   14.997) n=2
[  14.997,   17.107) n=0
[  17.107,   19.217) n=1
[  19.217,   21.327) n=0
[  21.327,   23.437) n=2
[  23.437,   25.548) n=1
[  25.548,   27.658) n=0
[  27.658,   29.768) n=0
[  29.768,   31.878) n=0
[  31.878,   33.988) n=1
[  33.988,   36.098) n=0
[  36.098,   38.208) n=1

natefoo · 2024-10-15T20:07:22Z

I have very few "ok" clustalw jobs with inputs over 60 MB.

bgruening · 2024-10-15T21:02:57Z

galaxy@sn06:~$ gxadmin tsvquery tool-metrics 'toolshed.g2.bx.psu.edu/repos/devteam/clustalw/clustalw/2.1+galaxy1' memory.peak --ok | awk '{print $1 / 1024 / 1024 / 1024}' | gxadmin filter histogram


(   0.215,    0.716) n=2712  **************************************************
[   0.716,    1.217) n=4     
[   1.217,    1.718) n=15    
[   1.718,    2.219) n=4     
[   2.219,    2.721) n=2     
[   2.721,    3.222) n=1     
[   3.222,    3.723) n=1     
[   3.723,    4.224) n=0     
[   4.224,    4.725) n=0     
[   4.725,    5.227) n=0     
[   5.227,    5.728) n=0     
[   5.728,    6.229) n=0     
[   6.229,    6.730) n=0     
[   6.730,    7.231) n=0     
[   7.231,    7.733) n=0     
[   7.733,    8.234) n=0     
[   8.234,    8.735) n=0     
[   8.735,    9.236) n=0     
[   9.236,    9.737) n=0     
[   9.737,   10.239) n=0     
[  10.239,   10.740) n=0     
[  10.740,   11.241) n=0     
[  11.241,   11.742) n=0     
[  11.742,   12.243) n=0     
[  12.243,   12.745) n=0     
[  12.745,   13.246) n=0     
[  13.246,   13.747) n=1     
[  13.747,   14.248) n=1     
[  14.248,   14.750) n=0

bgruening · 2024-10-15T21:10:19Z

galaxy@sn06:~$ gxadmin tsvquery tool-metrics 'toolshed.g2.bx.psu.edu/repos/iuc/raxml/raxml/8.2%' memory.peak --ok --like | awk '{print $1 / 1024 / 1024 / 1024}' | gxadmin filter histogram
(   0.219,    0.558) n=57    **************************************************
[   0.558,    0.897) n=1     
[   0.897,    1.236) n=2     *
[   1.236,    1.575) n=1     
[   1.575,    1.914) n=3     **
[   1.914,    2.253) n=0     
[   2.253,    2.592) n=0     
[   2.592,    2.931) n=1     
[   2.931,    3.270) n=1     
[   3.270,    3.609) n=1

cat-bro · 2024-10-15T22:26:57Z

On AU, for clustalw:

$ gxadmin tsvquery tool-metrics 'toolshed.g2.bx.psu.edu/repos/devteam/clustalw/clustalw/2.1+galaxy1' memory.max_usage_in_bytes --ok | awk '{print $1 / 1024 / 1024 / 1024}' | gxadmin filter histogram
(   0.018,    1.083) n=1451  **************************************************
[   1.083,    2.147) n=14
[   2.147,    3.212) n=4
[   3.212,    4.277) n=2
[   4.277,    5.341) n=0
[   5.341,    6.406) n=0
[   6.406,    7.471) n=0
[   7.471,    8.535) n=0
[   8.535,    9.600) n=0
[   9.600,   10.665) n=0
[  10.665,   11.729) n=0
[  11.729,   12.794) n=0
[  12.794,   13.859) n=0
[  13.859,   14.923) n=1
[  14.923,   15.988) n=1
[  15.988,   17.053) n=0
[  17.053,   18.117) n=0
[  18.117,   19.182) n=0
[  19.182,   20.246) n=0
[  20.246,   21.311) n=0
[  21.311,   22.376) n=0
[  22.376,   23.440) n=0
[  23.440,   24.505) n=0
[  24.505,   25.570) n=1

cat-bro · 2024-10-15T23:42:54Z

For latest raxml:

cat@galaxy:~$ gxadmin tsvquery tool-metrics 'toolshed.g2.bx.psu.edu/repos/iuc/raxml/raxml/8.2.12+galaxy1' memory.max_usage_in_bytes --ok | awk '{print $1 / 1024 / 1024 / 1024}' | gxadmin filter histogram
(   0.030,    8.196) n=336   **************************************************
[   8.196,   16.361) n=0
[  16.361,   24.527) n=1
[  24.527,   32.692) n=0
[  32.692,   40.857) n=1
[  40.857,   49.023) n=1
[  49.023,   57.188) n=1
[  57.188,   65.354) n=0
[  65.354,   73.519) n=0
[  73.519,   81.684) n=0
[  81.684,   89.850) n=0
[  89.850,   98.015) n=0
[  98.015,  106.181) n=0
[ 106.181,  114.346) n=0
[ 114.346,  122.511) n=1

This gxadmin query/filter combo is extremely neat.
For some reason like isn't working for me 'toolshed.g2.bx.psu.edu/repos/iuc/raxml/raxml/8.2%' like 'toolshed.g2.bx.psu.edu/repos/iuc/raxml/raxml/8.2.12+galaxy1' evaluates to false. If I use the ~ instead of like it evaluates to true.

Thanks for this @natefoo! clustalw settings here probably came from AU and may not have changed since dtd days.

Remove clustalw

3a076f9

Set raxml memory to 3.7

2d77c91

natefoo changed the title ~~Remove clustalw~~ Remove clustalw, set fixed raxml memory Oct 15, 2024

cat-bro approved these changes Oct 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove clustalw, set fixed raxml memory #72

Remove clustalw, set fixed raxml memory #72

natefoo commented Oct 15, 2024 •

edited

Loading

natefoo commented Oct 15, 2024

bgruening commented Oct 15, 2024

bgruening commented Oct 15, 2024

cat-bro commented Oct 15, 2024

cat-bro commented Oct 15, 2024

Remove clustalw, set fixed raxml memory #72

Are you sure you want to change the base?

Remove clustalw, set fixed raxml memory #72

Conversation

natefoo commented Oct 15, 2024 • edited Loading

natefoo commented Oct 15, 2024

bgruening commented Oct 15, 2024

bgruening commented Oct 15, 2024

cat-bro commented Oct 15, 2024

cat-bro commented Oct 15, 2024

natefoo commented Oct 15, 2024 •

edited

Loading